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ABSTRACT 

A  neural  network  algorithm  is  used  to  generate  the  spatial  pattern  classes  for  Spatiotemporal  Pattern  Recog¬ 
nition  (SPR).  This  algorithm  is  known  as  Kohonen  Feature  Maps.  Training  vectors  are  presented  to  the 
network  one  at  a  time.  The  connection  strength  between  the  input  and  output  nodes  are  adaptively  updated. 
The  adaptation  process  is  associated  with  a  decay  of  the  adaptation  rate  as  well  as  a  shrinkage  of  the  neigh¬ 
borhood  for  updating.  The  final  values  of  connection  strength  represent  the  centroid  of  clusters  of  training 
patterns.  The  algorithm  was  tested  with  hypothetical  data  as  well  as  hydrophone  data.  Functional  forms 
and  constants  for  the  decay  and  the  shrinkage  were  empirically  determined.  The  algorithm  performs  well 
with  broadband  data  than  with  narrow  band  data.  Also,  the  algorithm  works  better  with  smaller  number 
of  pattern  classes. 

1  INTRODUCTION 

The  classification  of  spatiotemporal  patterns  such  as  waterfall  display-type  data  is  a  common  subject  to 
many  feature  recognition  problems.  An  example  of  waterfall  data  is  the  Fast  Fourier  Transformed  speech 
data.  The  data  are  buffered  in  a  two-  dimensional  array  with  time  represented  on  one  axis  and  the  frequency 
bins  on  the  other.  For  each  new  time  instant,  a  new  row  is  added  to  the  buffer  at  one  end,  and  another 
row  scrolls  off  the  opposite  end  of  the  buffer.  Relevant  applications  include  :  recovery  of  communication 
symbols,  radar  waveforms,  sonar  signals,  speech  recognition  and  vision  systems  [1].  Biological  vision  is  a 
spatiotemporal  pattern  recognition  process  involving  the  integration  of  many  pattern  fragments  resulting 
from  sequences  of  eye  movements(saccades)  [2].  In  the  case  of  radar  and  sonar  signals,  large  numbers,  in 
the  range  of  thousands  to  millions,  of  basic  template  patterns  have  to  be  classified  [3].  This  area  has  been 
traditionally  a  research  theme  of  mathematical  statistics.  The  advent  of  the  neurocomputer  technology 
and  its  potential  for  direct  implementation  has  inspired  the  design  of  many  neural  network  architectures  to 
perform  such  a  pattern  recognition  task  [3]. 

Hecht-Nielsen  devised  a  matched  filter  bank  neural  network  architecture  based  on  Grossberg’s  avalanche 
structure  [3,4  and  5].  To  facilitate  the  implementation  of  matched  filter  architecture,  Hecht-Nielsen  [3] 
suggested  to  automate  the  pattern  classes  generation  with  self-organizing  feature  maps  [6  and  7],  The 
required  self-organizing  tasks  are:  1)  the  determination  of  the  spatial  weight  vectors  and  2)  the  organization 
of  the  temporal  template. 

The  first  task  was  accomplished  via  Kohonen’s  feature  maps  and  the  second  was  implemented  with  a 
heuristic  learning  rule.  The  arrangement  of  our  spatiotemporal  pattern  classifier  is  depicted  in  Figure  1. 
The  spatial  weight  vectors  quantization  is  described  in  Section  2.  Section  3  summarizes  experimental  data. 
Section  4  discusses  the  experimental  results  and  their  implications.  The  heuristic  learning  rule  as  well  as  the 
general  performance  of  the  avalanche  matched  filter  will  be  presented  in  a  future  paper. 
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2  Learning  Weight  Vectors 

In  the  avalanche  matched  filter  architecture,  the  z  vectors  are  the  spatial  component  of  the  spatiotemporal 
reference  patterns  Pi,  P2,  ■  ■  ■ ,  P„-  They  are  the  spatial  weight  vectors  distributed  over  the  pattern  space  and 
sufficiently  describe  the  pattern  environment.  These  weight  vectors  can  be  manually  chosen  based  on  the 
training  set  patterns  in  an  a  priori  manner  or  they  could  be  generated  by  the  processing  elements  based  on 
the  self-organization  principle.  Kohonen  described  a  learning  algorithm  that  can  efficiently  perform  vector 
quantization  of  the  input  pattern  space,  i.e.:  the  classification  of  all  the  input  vectors.  The  Kohonen  neural 
network  clustering  algorithm  [6,  7  and  8]  is  repeated  here  for  completeness: 

1.  Given  a  neural  network  of  size  n  x  m,  where  n  is  the  size  of  the  input  vector,  i.e.  the  number  of 
elements  of  each  input  vector;  m  is  the  number  of  processing  elements  in  the  avalanche  matched  filter 
bank.  Randomly  assign  small  values  to  z,;’s.  And  normalize  zj  to  1. 

2.  Let  Q(t)  =  {9i(t), 92(0i  ••■1  ?n(0}> ^  represent  the  training  vector.  Present  a  new  vector 

Q{t)  to  the  input  nodes.  Note  Q’s  are  normalized  so  that  |Q|  =  1. 

3.  Compute  the  distance  dj  between  the  input  vector  pattern  and  the  current  weight  vector,  i.e.: 

4.  Select  the  processing  element  j*  with  minimum  Euclidean  distance  dj  as  the  center,  i.e.  c  =  j*  .  Find 

a  neighborhood  of  c,  Nc{t),  by  choosing  the  processing  elements  whose  Euclidean  distances  d  are  less 
than  R(t)  from  j* .  Update  weight  vectors  within  Nc{t)  by  Zij(t  +  1)  =  +  a(t)  * 

5.  Go  to  step  2  and  repeat  step  3  and  4  until  the  weight  vectors  stop  changing  their  values. 

The  adaptation  parameter  a{t)  governs  the  converging  speed  toward  the  asymtotic  values  of  zj’s.  R(t) 
determines  the  influence  range  of  each  weight  vector  zj .  As  suggested  by  Kohonen,  both  a{t)  and  R{t)  should 
decrease  in  time  monotonically  [8].  In  our  experiment  a{t)  is  given  by  a(t)  =  Ka  <  1,  where 

Ka  is  a  constant  for  maximum  amount  of  adaptation  and  Ta  is  the  constant  governing  the  decreasing  rate  of 
a(t)  during  the  presentation  session.  The  size  of  Nc{t)  is  determined  by  the  empirical  function  R{t),  which 
is  defined  as,  R(t)  =  i?o  -f  where  Rq  and  are  constants  and  governing  the  shrinking  rate. 

3  Experimental  Results 

Only  synthesized  data  testings  are  presented  in  this  section.  Tests  results  with  hydrophone  data  are  not 
included  in  here  due  to  the  nature  of  data.  However,  we  will  draw  some  general  conclusions  of  the  tests 
in  section  5.  Eight  vectors  are  used  for  experiment.  Each  vector  has  four  elements,  (see  Table  1  and 
Figure  1).  Note  the  fourth  vector  and  the  last  vector  are  the  same.  This  choice  is  purposely  made  to  test 
the  performance  of  Kohonen’s  feature  map  technique  on  categorizing  data.  Even  though  the  data  set  is 
parsimonious  ,  it  revealed  some  interesting  characteristics  of  Kohonen  Feature  Maps  technique.  A  series  of 
experiments  were  conducted  to  perform  the  vector  quantization  as  described  in  [7].  The  experiments  are 
to  determine  a  set  of  appropriate  values  for  the  parameters  used  in  Section  3  as  well  as  to  evaluate  the 
performance  of  Kohonen  Feature  Maps  on  small  data  sets.  We  hypothesized  that: 

1.  An  ideal  vector  quantizer  should  produce  a  set  of  weight  vectors  which  is  identical  to  the  training 
vectors  if  the  number  of  categories  is  the  same  as  the  number  of  the  training  vectors  ; 

2.  P'or  the  same  vectors  ,  the  vector  quantizer  should  coalesce  them  into  one  category. 

As  pointed  out  by  Kohonen  [6,  p.l32],  the  form  of  the  algorithm  used  is  a  choice  for  mathematical 
simplicity.  Therefore,  we  do  not  expect  to  find  the  ’optimum’  values  of  the  parameters.  The  values  used 
in  these  experiments  are  purely  empirical.  Following  are  the  parameters  used:  a  =  Ka  *  and 
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R=  1  +  (m  -  2)  ♦  where  m  is  the  number  of  training  vectors;  Ka  in  most  cases  are  0.9;  Ta  is  the 

decay  constant  of  the  learning  rate,  which  ranges  from  1000  to  10000  and  Tr  is  the  shrink  constant  for  the 
updating  neighborhood.  Its  magnitude  is  about  100. 

Table  2  shows  the  weight  vectors  generated  for  different  numbers  of  training  vectors.  Note  that  the 
generated  weight  vectors  do  not  match  with  the  training  vectors  except  n  =  3.  Table  3  are  the  eight  training 
vectors  classified  into  categories  of  various  coarseness.  Note  that  the  input  vectors  could  be  grouped  visually 
into  4  classes:  1,  5  and  6;  3  and  7;  4  and  8;  and  2  by  itself.  Kohonen’s  algorithm  seems  perform  rather  well 
with  such  classifications  (see  Table  3  and  Figure  2).  Also  note  that  Kohonen’s  algorithm  was  designed  to 
classify  large  amount  of  image  data  rather  than  simplified  data  such  as  in  our  experiments.  Nonetheless, 
simple  data  tests  provide  an  effective  means  to  evaluate  the  performance  of  algorithm. 

Results  in  Tables  2  and  3  were  obtained  after  a  large  number  of  trials  by  adjusting  the  parameters.  During 
the  experimentation,  we  have  attempted  linearly  decrease  the  learning  rate  as  well  as  the  neighborhood  shrink 
rate.  Neither  performed  as  good  as  the  exponentially  decreased  rates.  We  also  experimented  with  assigning 
different  initial  values  to  the  weight  vectors.  The  algorithm  generate  consistant  weight  vectors  regardless 
the  initial  values  provided  the  number  of  presentation  is  sufficiently  large.  However,  the  weight  vectors  are 
sensitive  to  the  presentation  sequence  of  the  training  vectors  as  shown  in  Table  4. 


4  Discussions  and  Conclusions 

The  values  of  constants  were  obtained  through  many  iterations.  Generally,  the  neighborhood  size  should 
start  with  one  neuron  less  than  the  number  of  classes;  the  shrink  constant  Tr  should  be  less  than  50%  of  the 
planned  presentation  time  and  the  decay  constant  Ta  is  about  10%  of  the  shrink  constant  Tr . 

Our  experimental  results  have  shown  some  interesting  comparisions  with  that  reported  by  Nasrabadi  and 
Feng  [7]  on  image  compression: 

1.  Our  learning  rate  constants  are  much  higher  than  that  used  in  image  compression,  0.9  vs.  0.1.  Our 
conjecture  is  that  this  learning  constant  is  inversely  proportional  to  the  size  of  training  data.  This  might 
be  because  faster  adaptation  can  ’freeze’  the  weight  vectors  into  suboptimal  values.  Such  phenomenon 
is  especially  true  with  a  large  number  of  training  data. 

2.  The  time  constant  for  the  decreased  learning  rate  can  be  as  low  as  500  as  while  in  image  compression 
a  value  of  10,000  is  typical.  Again,  we  contend  this  constant  is  size  dependent; 

3.  The  time  constants  for  the  shrink  rate  of  the  affected  neighborhood  in  both  cases  are  about  100.  Note 
this  value  is  somewhat  size  independent. 

4.  The  neighborhood  constant  m  —  2,  where  m  is  the  number  of  elements,  reflects  the  adjustment  on  the 
size  of  updating  neighborhood  as  the  number  of  training  vectors  changes. 

In  Table  2  (a),  seven  patterns  classified  into  eight  weight  vectors  which  are  different.  This  is  not  a  problem 
if  out  of  all  the  eight  vectors  there  are  seven  vectors  match  with  the  input  (or  training  vectors).  Generally, 
if  one  tries  to  overclassify  the  input  vectors,  one  may  be  ended  up  with  redundant  vectors  or  vectors  which 
do  not  belong  to  any  class.  The  case  that  the  number  of  weight  vectors  exceedes  the  number  of  categories 
has  no  practical  applications. 

In  conclusion,  the  less  the  number  of  classes,  the  better  the  algorithm  performs.  The  hypothetical  data 
tests  provide  an  effective  means  to  evaluate  the  algorithm.  The  outcome  of  weight  vectors  are  more  sensitive 
to  the  neighborhood  parameters,  Tr  and  Kr,  than  the  parameters  used  in  the  adaptation  process,  i.e.  Ta 
and  Ka-  The  fact  that  a  wide  range  of  values  can  be  assigned  to  the  adaptation  parameters  implies  that 
there  is  no  unique  set  of  parametric  values  for  optimum  weight  vectors.  Nasrabadi  and  Feng  [7]  suggested 
that  the  optimum  weight  vectors  might  be  obtained  if  the  adaptation  and  the  neighborhood  parameters  were 
decreased  very  slowly.  However  our  experiments  indicate  that  slowly  decreasing  those  parameters  do  not 
warrant  the  optimum  weight  vectors  for  even  small  training  set  with  only  four  patterns.  It  would  be  more 
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so  in  the  case  with  a  large  number  of  training  patterns.  Such  limitation  can  be  attributed  to  the  simplified 
formalism  as  pointed  out  hy  Kohonen  [6].  Nonetheless  the  fact  that  optimum  weight  vectors  can  not  be 
readily  determined  does  not  limit  the  uses  of  the  Kohonen’s  Feature  Maps  for  vector  quantization  so  long  as 
the  algorithm  is  used  for  high  level  features  extraction  or  categorization. 

As  for  the  testing  with  hydrophone  data,  we  concluded  that  the  algorithm  classifies  the  broadband  data 
better  than  it  does  for  the  narrow  band  data.  Althongh  the  data  are  not  presented  here,  one  can  easily  verify 
this  by  examing  the  classification  between  the  vectors  with  and  without  zero  elements.  Another  experience 
with  using  Kohonen  Feature  Maps  for  hydrophone  data  testing  is  that:  it  is  better  not  to  normalize  the 
training  vectors  for  clustering.  Patterns  tend  to  lose  their  features  once  they  are  normalized.  This  is  because 
normalization  is  often  a  process  of  variance  attenuation.  However,  the  weight  vector  z  has  to  be  normalized 
prior  to  their  uses  in  the  SPR  procedures.  The  fact  that  the  weight  vectors  are  affected  by  the  sequence  of 
presentation  has  significant  implication  on  SPR.  Inconsistant  weight  vectors  will  be  generated  for  differrent 
training  events  of  the  same  cleiss.  Such  inconsistancy  could  eventually  affect  the  overall  performance  of 
spatiotemporal  pattern  recognition. 
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Figure  1 

Spatiotemporai  Pattern  Recognition  Flow 

Diagram 
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Figure  2  Normalized  Training  Patterns 


Figure  3  Weight  vectors  generated  by  Kohonen's  Feature  maps 
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(a.)  Unnorm&lized 


qt(0,0]  =0.00  qt[l,0j  =3.00  qt[2.0l  =2.00  qt(3,0l  =0.00 

qt(0,l]  =0.00  qt(l,lj  =4.00  qt[2,li  =6-00  qt[3,li  =8.00 

qt[0,2]  =2.00  qt[l,2j  =2.00  qt[2,2i  =1.00  qt[3,2j  =1.00 

qt(0,3]  =1.00  qt(l,3j=2.00  qt[2,3j  =3.00  qt(3,3j  =4.00 
qt(0,4]  =0.00  qt[l,4j  =3.50  qt[2,4j  =2.00  qti3,4  =0.00 

qt[0,6]  =0.00  qtil.si  =3.00  qt[2,sj  =1.50  qt[3,5  =0.20 

qtio.ej  =1.20  qi[l,6l  =2.10  qt[2,6i  =2.70  qtfS.S  =4.00 

qt[0,7]  =2.00  qt[l,7j  =2.00  qt[2,Tj  =1.00  qt[3,T  =1.00 

Tible  1.  Training  patt«rns. 


(a)  n  =  8 

*I0,0]=0,00Q  *[1,0]=0.3T1  *12,01=0.557  *[3,01=0. 743 
sio.l]=0.1d3  *(1,1]=0.365  *[2,11=0.548  *  3,1  =0.730 
ijo,2]=0.222  s[l,2]=0.369  zl2,2]=0.500  *  3,2  =0.741 

s|0,3]=0.832  *[1,31=0.632  s[2,3j=0.316  *3,3=0.316 
l[o,4]s0.008  x[l,4]  =  0.887  z[2,4}=0.4S0  *3,4=0.056 

s[0,5]  =0.008  *[l,5]  =  0.887  z[2,5j  =  0.450  *  3,5  =0.056 

s[0,6]=0.000  z[l,6]=0.851  x|2, 61=0.524  *  3,6  =0.000 
i[0,7]=0.000  *[1,71=0.893  x[2,7]=0.446  *[3,7=0.060 

(b)  n  =  3 

*|0,0]=0.000  *[1,0]=0.371  *[2,01=0.557  »[3,0]=0.743 
a|0.1]=0.632  *[l,lj=0.632  *[2,11=0.316  *[3,lj=0.316 
z|o,2]=0.000  z[l,2]s0.832  z|2,2]=0.555  z[3,2]=0.000 


Table  2.  Weight  vector*  generated  by  Kohonen’*  Feature  Map*  for  the  first  n  training  vectors  in  Table  1,  n  is  reduced  from  eight  to  three.  Ka  =  0-90,  Tq 
=  1000  ,  Tr  =  100  Note  the  number  of  classes  is  the  same  number  of  the  training  vectors.  Note  also  the  weight  vectors  generated  are  not  identical  to  the 
training  vectors  except  n  =  3.  The  redundant  vectors  in  (a)  are  due  to  the  over-classification  of  the  input  vectors. 


(a)  c  =  7 

z|0.0]=0.000  z[l,0]=0.371  s[2,0]=0.557  z[3,0]s0.743 
e|0,1]=0.183  z[i,1]=0.365  *[2,l]=0.54e  z[3,lj=0.730 
*|0.2]=0.222  *[1,21=0.389  *[2,21=0.500  *[3,21=0.741 
*|o,3j=0.632  *[l,  31=0. 632  *[2,31=0.316  *[3,3j=0.316 

sj0,4j  =  0.006  *[l,4]=0.887  z[2,4]=0.450  z[3,4j  =  0.056 

zjo,5]=Q.OOa  z[l,  51=0.851  z[2,5j=0.524  z[3,5j=0.000 

*[0,61  =  0.000  *[l,  61  =  0. 893  *[2,61  =  0.446  z[3,6j  =  0.060 

(b)  C  =  3 

*[0,0)=0.1S7  *[l,0]=0.377  *[2,0]=0.529  z[3,0l=0.738 
*|o, 11=0.632  z[l,l]=0.632  *[2,11=0.316  *[3,lj=0.3l6 
zio,2]=0.000  x[l,2]=0.871  *[2,31=0.488  *[3,21=0.036 


Table  3.  Weight  vectors  generated  by  Kohonen’s  Feature  Maps  for  all  eight  input  vectors  in  Table  1  clustered  into  different  number  of  classes,  c  .  Here  the 
number  of  classes  is  less  than  the  number  of  the  training  vectors,  n.  K*  =  0.90,  Ta  =  10000  ,  Tr  =  100  Note  that  the  weight  vectors  are  not  indentical  to 
the  training  vectors  when  c  s  n  =  8,  however,  it  does  appear  to  extract  the  features  of  the  as  the  number  of  classes,  c  ,  is  reduced. 


q[0]=0.000  q{l]=0.832  q[2]=0.555  q[3|=0.000 
q[0j=0.000  qill=0.371  q[2i=0.557  q[3j=0.743 
q[o]=0.163  qil]=0.365  q[2js0.548  q[3i=0.730 
q[0]=0.633  q|ljs0.633  q[3i=0.316  q[3j=0.316 
q[o]=0.000  qili=0.866  q[3j=0.496  q[3j=0.000 
q[o]=0.000  q|l]=0.893  q[2i=0.446  q[3i=0.060 
q[o]=0.333  q|lis0.389  q[3j=0.500  q[3j=0.741 
q[o]=0.633  q[l]=0.632  q[3j=0.316  q[3j=0.316 


(a)  c  =  T 

*  0,0]=0.633  *[1,0]=0.632  x(3,0l=0.dl6  z{3,0|=0.316 
*0,1]=0.567  *[l,l]=0.605  z[2,l]=0.341  x{3,lj=0.361 
z0,2]=0.000  z[l, 21=0. 371  s[2,2]=0.557  *{3,31=0.743 
*0,31=0.163  *[l,3]=0.365  z[3,3]=0.548  *[3,31=0.730 
x[o, 41=0.332  *[l,4]=0.369  *[2,41=0.500  *[3,41=0.741 
*[o,5)=0.196  »[l,5]=0.447  *[3,51=0.495  x[3,5j=0.661 
xlo.ejsO.OOO  z[l, 61=0. 867  *[2,61=0.494  z{3,6j=0.033 

(b)  c  =  3 

*[0,01=0.632  *[1 ,0]  =0.633  *[2,0]  =  0.316  *(3,0)=0.S16 
*|o, 11=0.144  *(l,lj=0.376  *[2,11=0.533  *(3,11=0.738 
s[0,2]=0.000  x[l  ,21=0.867  z[2,2j=0.494  *[3,2j=0.022 


Table  4  Same  Training  patterns  except  that  the  third  and  the  fourth  vectors  are  reversed  during  the  training  session.  Ka  =  0.90,  Tg  =  10000  ,  Tr  =  100 
Note  that  the  weight  vectors  are  different  from  the  ones  generated  in  Table  3  even  with  the  same  parameter  set. 
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